Skip to content

Add workflow to refresh jsonschema_for_docs.json after each release#5200

Merged
shreyas-goenka merged 8 commits intomainfrom
workflow/update-schema-docs-on-release
May 7, 2026
Merged

Add workflow to refresh jsonschema_for_docs.json after each release#5200
shreyas-goenka merged 8 commits intomainfrom
workflow/update-schema-docs-on-release

Conversation

@shreyas-goenka
Copy link
Copy Markdown
Contributor

@shreyas-goenka shreyas-goenka commented May 6, 2026

Summary

  • New .github/workflows/update-schema-docs.yml triggers on v* tag pushes (and manual workflow_dispatch, which auto-detects the latest v* tag).
  • Regenerates bundle/schema/jsonschema_for_docs.json from main (full tag history is required so since_version.go can stamp x-since-version), asserts only that file changed, and pushes the result to the dedicated docgen branch. main is never modified.
  • docgen was bootstrapped as an orphan branch containing only README.md; the workflow adds bundle/schema/jsonschema_for_docs.json and updates it on every release.

Why

bundle/internal/schema/since_version.go derives x-since-version from git tag --list 'v*' at generation time, so the committed file becomes stale the moment the next tag is pushed. This workflow keeps a clean publish branch (docgen) current automatically, decoupled from main.

End-to-end test

Triggered the workflow via a temporary branch trigger, verified the file landed on docgen with up-to-date since-versions:

$ curl -sfL https://raw.githubusercontent.com/databricks/cli/docgen/bundle/schema/jsonschema_for_docs.json \
    | grep -o '"x-since-version": *"v[^"]*"' | sort | uniq -c | sort -rn | head -5
 631 "x-since-version": "v0.229.0"
  54 "x-since-version": "v0.298.0"
  46 "x-since-version": "v0.287.0"
  46 "x-since-version": "v0.279.0"
  31 "x-since-version": "v0.260.0"

Subsequent run (no schema change) correctly logs docgen already up to date for v0.299.0; nothing to commit.

`bundle/internal/schema/since_version.go` reads `git tag --list 'v*'` to
compute `x-since-version` annotations. The committed file therefore goes
stale by one release as soon as the next tag is pushed: fields shipped
in that tag don't get stamped until the schema is regenerated against a
tag list that includes the new tag.

The new workflow runs on every `v*` tag push (and via workflow_dispatch),
regenerates the file from `main`, asserts that nothing other than
`bundle/schema/jsonschema_for_docs.json` changed, and pushes the update
directly to `main`.

Co-authored-by: Isaac
main remains untouched. The workflow regenerates the schema in a main
checkout (full history + tags so since_version.go can stamp), copies the
result into a worktree on the docgen branch, and pushes there.

workflow_dispatch no longer takes a tag input; it picks up the most
recent v* tag automatically.

Co-authored-by: Isaac
A branch push left GITHUB_REF starting with refs/heads/, so the strip
was a no-op and the wrong value ended up in the commit message.
ref_type/ref_name are unambiguous.

Co-authored-by: Isaac
@shreyas-goenka shreyas-goenka marked this pull request as ready for review May 6, 2026 19:28
@shreyas-goenka shreyas-goenka requested a review from pietern May 6, 2026 19:28
Copy link
Copy Markdown
Contributor

@janniklasrose janniklasrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add branch protection for the docgen branch?

on:
push:
tags:
- "v*"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider more scoped matching

Suggested change
- "v*"
- "v[0-9]+.[0-9]+.[0-9]+*"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6341c63. Used the suggested pattern v[0-9]+.[0-9]+.[0-9]+* for the trigger.

if [ "$REF_TYPE" = "tag" ]; then
tag="$REF_NAME"
else
tag=$(git tag --list 'v*' --sort=-version:refname | head -n 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto about tag matching

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6341c63. git tag --list uses fnmatch (no +), so I match the same shape via a grep -E post-filter:

tag=$(git tag --list "v*" --sort=-version:refname | grep -E "^v[0-9]+\.[0-9]+\.[0-9]+" | head -n 1)

@shreyas-goenka
Copy link
Copy Markdown
Contributor Author

@janniklasrose proposed branch protection for docgen (to apply via Settings → Rules → Rulesets):

  • Target: branch refs/heads/docgen
  • Rules:
    • creation (no recreate after deletion)
    • deletion (no delete)
    • non_fast_forward (no force-push)
    • pull_request — require 1 approval; allowed_merge_methods: [merge, rebase, squash]
  • Bypass actors: GitHub Actions integration (actor_id: 15368, bypass_mode: always) — so the workflow’s GITHUB_TOKEN can still push directly. Without this, the workflow would fail.

Equivalent JSON (for the REST API, kept here for reference):

{
  "name": "docgen: protected publish branch",
  "target": "branch",
  "enforcement": "active",
  "conditions": {"ref_name": {"include": ["refs/heads/docgen"], "exclude": []}},
  "rules": [
    {"type": "creation"},
    {"type": "deletion"},
    {"type": "non_fast_forward"},
    {"type": "pull_request", "parameters": {
      "required_approving_review_count": 1,
      "dismiss_stale_reviews_on_push": false,
      "require_code_owner_review": false,
      "require_last_push_approval": false,
      "required_review_thread_resolution": false,
      "allowed_merge_methods": ["merge", "rebase", "squash"]
    }}
  ],
  "bypass_actors": [
    {"actor_id": 15368, "actor_type": "Integration", "bypass_mode": "always"}
  ]
}

I’m applying this via the UI separately since it touches repo security config.

@shreyas-goenka
Copy link
Copy Markdown
Contributor Author

I cannot figure out how to make github ruleset work so only the action can modify it (without creating a dedicated github app) - not worth it, lets just skip the branch protection.

The worse that can happen is someone temporarily breaks the documentation generator if they accidentally write to this branch. I think it's fine without a rule protection

@shreyas-goenka shreyas-goenka merged commit f506949 into main May 7, 2026
22 of 23 checks passed
@shreyas-goenka shreyas-goenka deleted the workflow/update-schema-docs-on-release branch May 7, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants